NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Rout, Litu; Chen, Yujia; Ruiz, Nataniel; Caramanis, Constantine; Shakkottai, Sanjay; Chu, Wen-Sheng (May 2025, ICLR 2025)

Generative models transform random noise into images, while their inversion aims to reconstruct structured noise for recovery and editing. This paper addresses two key tasks: (i) inversion and (ii) editing of real images using stochastic equivalents of rectified flow models (e.g., Flux). While Diffusion Models (DMs) dominate the field of generative modeling for images, their inversion suffers from faithfulness and editability challenges due to nonlinear drift and diffusion. Existing DM inversion methods require costly training of additional parameters or test-time optimization of latent variables. Rectified Flows (RFs) offer a promising alternative to DMs, yet their inversion remains underexplored. We propose RF inversion using dynamic optimal control derived via a linear quadratic regulator, and prove that the resulting vector field is equivalent to a rectified stochastic differential equation. We further extend our framework to design a stochastic sampler for Flux. Our method achieves state-of-the-art performance in zero-shot inversion and editing, surpassing prior works in stroke-to-image synthesis and semantic image editing, with large-scale human evaluations confirming user preference. See our project page https://rf-inversion.github.io/ for code and demo.
more » « less
Free, publicly-accessible full text available May 1, 2026
RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

Rout, Litu; Chen, Yujia; Ruiz, Nataniel; Kumar, Abhishek; Caramanis, Constantine; Shakkottai, Sanjay; Chu, Wen-Sheng (May 2024, https://doi.org/10.48550/arXiv.2405.17401)

The authors propose Reference-Based Modulation (RB-Modulation), a plug-and-play, training-free solution for personalization of diffusion models. Existing training-free methods face challenges in (a) extracting style from reference images without additional style or content text descriptions, (b) avoiding unwanted content leakage from style references, and (c) composing style and content effectively. RB-Modulation addresses these issues using a novel stochastic optimal controller, where a style descriptor encodes the desired attributes through a terminal cost. The induced drift ensures high fidelity to the reference style while adhering to the text prompt. Additionally, the authors introduce a cross-attention-based feature aggregation scheme that decouples content and style from the reference image. With both theoretical justification and empirical validation, RB-Modulation demonstrates precise control of content and style in a training-free manner, while enabling seamless composition—eliminating reliance on external adapters or ControlNets.
more » « less
Full Text Available
Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System

https://doi.org/10.1109/FG52635.2021.9667001

Ruiz, Nataniel; Yu, Hao; Allessio, Danielle A.; Jalal, Mona; Joshi, Ajjen; Murray, Thomas; Magee, John J.; Whitehill, Jacob R.; Ablavsky, Vitaly; Arroyo, Ivon; et al (December 2021, 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021))

In this work, we propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS). By analyzing a student's face and gestures, our method predicts the outcome of a student answering a problem in an ITS from a video feed. Our work is motivated by the reasoning that the ability to predict such outcomes enables tutoring systems to adjust interventions, such as hints and encouragement, and to ultimately yield improved student learning. We collected a large labeled dataset of student interactions with an intelligent online math tutor consisting of 68 sessions, where 54 individual students solved 2,749 problems. We will release this dataset publicly upon publication of this paper. It will be available at https://www.cs.bu.edu/faculty/betke/research/learning/. Working with this dataset, our transfer-learning challenge was to design a representation in the source domain of pictures obtained “in the wild” for the task of facial expression analysis, and transferring this learned representation to the task of human behavior prediction in the domain of webcam videos of students in a classroom environment. We developed a novel facial affect representation and a user-personalized training scheme that unlocks the potential of this representation. We designed several variants of a recurrent neural network that models the temporal structure of video sequences of students solving math problems. Our final model, named ATL-BP for Affect Transfer Learning for Behavior Prediction, achieves a relative increase in mean F -score of 50 % over the state-of-the-art method on this new dataset.
more » « less
Full Text Available

Search for: All records